102 research outputs found

    An update on statistical boosting in biomedicine

    Get PDF
    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine

    Extending the inferential capabilities of model-based gradient boosting algorithms

    Get PDF
    Hintergrund und Ziele Die rasante technologische Entwicklung der vergangenen Jahrzehnte ermöglichte nicht nur die praktische Anwendung zuvor lediglich theoretischer Konzepte der statistischen Datenanalyse sondern fĂŒhrte auch zu einer Vielzahl neuer, zunehmend rechenintensiven Analysestrategien aus dem Umfeld des maschinellen Lernens. Weiterentwicklungen der erfolgreichen Boosting-Algorithmen offenbarten deren NĂ€he zu bekannten statistischen Konzepten und machten diese fĂŒr die SchĂ€tzung regularisierter Regressionsparameter additiver Modelle nutzbar. Die vorliegende Arbeit richtet den Fokus auf die daraus resultierenden Modelleigenschaften sowie deren Verbesserung und Erweiterung bezĂŒglich inferenzstatistischer ValiditĂ€t und Interpretierbarkeit. Methoden Alle vorgestellten AnsĂ€tze beziehen sich auf unterschiedliche Formen modellbasierter Boosting-Algorithmen. Diese starten bei der Initialisierung mit einem leeren Nullmodell, welches in den nachfolgenden Iterationen schrittweise durch wiederholte Anwendung von Regressionsfunktionen sequentiell erweitert wird um schließlich ein additives Modell zu bilden. Die vorliegende Arbeit untersucht die daraus resultierenden SchĂ€tzer und Modelleigenschaften zunĂ€chst im Vergleich mit anderen Regularisierungsmethoden wie L1L_1-Penalisierung. DarĂŒber hinaus werden alternative Strategien zur Verbesserung der Variablenselektion des Gesamtmodells vorgeschlagen sowie alternative Teststrategien zur ÜberprĂŒfung einzelner Effekte entwickelt. Dabei wird vornehmlich auf Varianten der Variablenpermutation und Bootstrapping-Methoden zurĂŒckgegriffen. Ergebnisse Die Regularisierung linearer EffektschĂ€tzer mittels modellbasierten Gradientenboostings verhĂ€lt sich im Falle diagonaler Dominanz der inversen Kovarianzmatrix der PrĂ€diktorvariablen mit sinkender SchrittlĂ€nge Îœ\nu asymptotisch zur L1L_1-Penalisierung. Unterschiede zwischen den Verfahren lassen sich auf die sequentielle Aggregation des Boosting-Modells zurĂŒckfĂŒhren, wodurch zwar einerseits die Regularisierungspfade stabilisiert werden, andererseits aber die Modelle tendenziell mehr Variablen aufnehmen. Um eine Vielzahl falsch positiver Selektionen zu vermeiden, kann ĂŒber die Erweiterung der Daten um permutierte Varianten der PrĂ€diktorvariablen der Fokus von der PrognosegĂŒte auf die Variablenselektion gelenkt werden. Residuenpermutation und parametrischer Bootstrap ermöglichen die Berechnung von p-Werten, die in niedrigdimensionalen Szenarien die gleiche Power erreichen wie Wald-Tests fĂŒr Maximum-Likelihood-SchĂ€tzer. Praktische Schlussfolgerungen Die Ergebnisse dieser Arbeit bieten eine Entscheidungshilfe bei der Wahl zwischen Boosting und L1L_1-Penalisierung als Regularisierungsmethode fĂŒr statistische Modelle. Zudem wird die Anwendbarkeit modellbasierter Gradient-Boosting-Algorithmen in Situationen verbessert, in denen die weiterfĂŒhrende Interpretation der selektierten Variablen von zentralem Interesse ist. Zum Einen lĂ€sst sich die Genauigkeit der Variablenselektion durch alternatives Tuning mittels permutierter Variablen erhöhen. DarĂŒber hinaus erlaubt die Verwendung des parametrischen Bootstraps erstmals die Berechnung von pp-Werten fĂŒr einzelne EffektschĂ€tzer modellbasierter Gradient-Boosting Algorithmen in hochdimensionalen Szenarien mit korrelierten PrĂ€diktorvariablen.Background and aims The rapid development of computer technology in recent decades has not only enabled the practical application of previously merely theoretical ideas of statistical data analysis but has also led to a multitude of new and increasingly computationally intensive analysis strategies emerging from the field of machine learning. Further developments of the successful boosting algorithms revealed their relationship to known statistical concepts and made them usable for the estimation of regularized regression parameters of additive models. This thesis focuses on the resulting model properties as well as their improvement and extension with regard to inferential statistical validity and interpretability. Methods All presented approaches address various forms of model-based boosting algorithms. The algorithm is initialized with an empty model, which is sequentially updated in the following iterations by repeated application of small regression functions to build a final additive model. This thesis examines the resulting estimators and model properties in comparison with other regularization methods such as L1L_1-penalization. In addition, alternative strategies for improving the variable selection properties of the overall model are proposed and strategies for testing individual effects are developed. For this purpose, variants of variable permutation and bootstrapping methods are developed. Results Regularization of linear effect estimators by means of model-based gradient boostings exhibits asymptotic behaviour to L1L_1-penalization with decreasing learning rate Îœ\nu if and only if the inverse covariance matrix of the predictor variables is diagonally dominant. Differences between the methods can be traced back to the sequential aggregation of the boosting model, which stabilizes the regularization paths but makes the models relatively larger. Therefore, in order to avoid a large number of false positive selections, the focus can be shifted from the prediction to variable section accuracy by extending the dataset with permutations of the predictor variables. Residual permutation and parametric bootstrap allow the computation of p-values with test power on par with Wald-tests for maximum likelihood estimators in low-dimensional scenarios. Practical conclusions The results of this work provide a guideline for the choice between boosting and L1L_1-penalty as regularization method for statistical models. In addition, the applicability of model-based gradient-boosting algorithms is improved in situations where more detailled interpretation of the selected variables is of central interest. The reliability of true informative value of selected variables is increased by using alternative tuning via permuted variables. Moreover, making use of the parametric bootstrap allows for the first time the calculation of p-values for single effect estimators of gradient boosting algorithms in high dimensional scenarios with correlated predictor variables

    MedGAN: Medical Image Translation using GANs

    Full text link
    Image-to-image translation is considered a new frontier in the field of medical image analysis, with numerous potential applications. However, a large portion of recent approaches offers individualized solutions based on specialized task-specific architectures or require refinement through non-end-to-end training. In this paper, we propose a new framework, named MedGAN, for medical image-to-image translation which operates on the image level in an end-to-end manner. MedGAN builds upon recent advances in the field of generative adversarial networks (GANs) by merging the adversarial framework with a new combination of non-adversarial losses. We utilize a discriminator network as a trainable feature extractor which penalizes the discrepancy between the translated medical images and the desired modalities. Moreover, style-transfer losses are utilized to match the textures and fine-structures of the desired target images to the translated images. Additionally, we present a new generator architecture, titled CasNet, which enhances the sharpness of the translated medical outputs through progressive refinement via encoder-decoder pairs. Without any application-specific modifications, we apply MedGAN on three different tasks: PET-CT translation, correction of MR motion artefacts and PET image denoising. Perceptual analysis by radiologists and quantitative evaluations illustrate that the MedGAN outperforms other existing translation approaches.Comment: 16 pages, 8 figure

    ipA-MedGAN: Inpainting of Arbitrary Regions in Medical Imaging

    Full text link
    Local deformations in medical modalities are common phenomena due to a multitude of factors such as metallic implants or limited field of views in magnetic resonance imaging (MRI). Completion of the missing or distorted regions is of special interest for automatic image analysis frameworks to enhance post-processing tasks such as segmentation or classification. In this work, we propose a new generative framework for medical image inpainting, titled ipA-MedGAN. It bypasses the limitations of previous frameworks by enabling inpainting of arbitrary shaped regions without a prior localization of the regions of interest. Thorough qualitative and quantitative comparisons with other inpainting and translational approaches have illustrated the superior performance of the proposed framework for the task of brain MR inpainting.Comment: Submitted to IEEE ICIP 202

    Probing for Sparse and Fast Variable Selection with Model-Based Boosting

    Get PDF
    We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g., cross-validation or bootstrap) to find the optimal number of boosting iterations and prevent overfitting. In our proposed approach, we augment the data set with randomly permuted versions of the true variables, so-called shadow variables, and stop the stepwise fitting as soon as such a variable would be added to the model. This allows variable selection in a single fit of the model without requiring further parameter tuning. We show that our probing approach can compete with state-of-the-art selection methods like stability selection in a high-dimensional classification benchmark and apply it on three gene expression data sets

    Associated factors and comorbidities in patients with pyoderma gangrenosum in Germany: a retrospective multicentric analysis in 259 patients

    Get PDF
    Background: Pyoderma gangrenosum (PG) is a rarely diagnosed ulcerative neutrophilic dermatosis with unknown origin that has been poorly characterized in clinical studies so far. Consequently there have been significant discussions about its associated factors and comorbidities. The aim of our multicenter study was to analyze current data from patients in dermatologic wound care centers in Germany in order to describe associated factors and comorbidities in patients with PG. Methods: Retrospective clinical investigation of patients with PG from dermatologic wound care centers in Germany. Results: We received data from 259 patients with PG from 20 different dermatologic wound care centers in Germany. Of these 142 (54.8\%) patients were female, 117 (45.2\%) were male; with an age range of 21 to 95 years, and a mean of 58 years. In our patient population we found 45.6\% with anemia, 44.8\% with endocrine diseases, 12.4\% with internal malignancies, 9.3\% with chronic inflammatory bowel diseases and 4.3\% with elevated creatinine levels. Moreover 25.5\% of all patients had a diabetes mellitus with some aspects of potential association with the metabolic syndrome. Conclusions: Our study describes one of the world's largest populations with PG. Beside the well-known association with chronic bowel diseases and neoplasms, a potentially relevant new aspect is an association with endocrine diseases, in particular the metabolic syndrome, thyroid dysfunctions and renal disorders. Our findings represent clinically relevant new aspects. This may help to describe the patients' characteristics and help to understand the underlying pathophysiology in these often misdiagnosed patients

    A new attraction-detachment model for explaining flow sliding in clay-rich tephras

    Get PDF
    Altered pyroclastic (tephra) deposits are highly susceptible to landsliding, leading to fatalities and property damage every year. Halloysite, a low-activity clay mineral, is commonly associated with landslide-prone layers within altered tephra successions, especially in deposits with high sensitivity, which describes the post-failure strength loss. However, the precise role of halloysite in the development of sensitivity, and thus in sudden and unpredictable landsliding, is unknown. Here we show that an abundance of mushroom cap–shaped (MCS) spheroidal halloysite governs the development of sensitivity, and hence proneness to landsliding, in altered rhyolitic tephras, North Island, New Zealand. We found that a highly sensitive layer, which was involved in a flow slide, has a remarkably high content of aggregated MCS spheroids with substantial openings on one side. We suggest that short-range electrostatic and van der Waals interactions enabled the MCS spheroids to form interconnected aggregates by attraction between the edges of numerous paired silanol and aluminol sheets that are exposed in the openings and the convex silanol faces on the exterior surfaces of adjacent MCS spheroids. If these weak attractions are overcome during slope failure, multiple, weakly attracted MCS spheroids can be separated from one another, and the prevailing repulsion between exterior MCS surfaces results in a low remolded shear strength, a high sensitivity, and a high propensity for flow sliding. The evidence indicates that the attraction-detachment model explains the high sensitivity and contributes to an improved understanding of the mechanisms of flow sliding in sensitive, altered tephras rich in spheroidal halloysite
    • 

    corecore